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ABSTRACT. The performance of neural decoders can degrade over time due to 
nonstationarities in the relationship between neuronal activity and behavior. In 
this case, brain-machine interfaces (BMI) require adaptation of their decoders to 
maintain high performance across time. One way to achieve this is by use of pe- 
rt ■ riodical calibration phases, during which the BMI system (or an external human 
^ ' demonstrator) instructs the user to perform certain movements or behaviors. This 

approach has two disadvantages: (i) calibration phases interrupt the autonomous 

■ operation of the BMI and (ii) between two calibration phases the BMI perfor- 
mance might not be stable but continuously decrease. A better alternative would 
be that the BMI decoder is able to continuously adapt in an unsupervised manner 

r ^ ■ during autonomous BMI operation, i.e. without knowing the movement inten- 

tions of the user. 

In the present article, we present an efficient method for such unsupervised 
CZ3 ' training of BMI systems for continuous movement control. The proposed method 

utilizes a cost function derived from neuronal recordings, which guides a learn- 
ing algorithm to evaluate the decoding parameters. We verify the performance of 
our adaptive method by simulating a BMI user with an optimal feedback control 
, model and its interaction with our adaptive BMI decoder. The simulation results 

\& • show that the cost function and the algorithm yield fast and precise trajecto- 

ries towards targets at random orientations on a 2-dimensional computer screen. 
For initially unknown and nonstationary tuning parameters, our unsupervised 
' method is still able to generate precise trajectories and to keep its performance 

, stable in the long term. The algorithm can optionally work also with neuronal er- 

■ ror signals instead or in conjunction with the proposed unsupervised adaptation. 
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1. INTRODUCTION 

Brain Machine Interfaces (BMI) are systems that convey users brain signals 
into choices, text or movement |[T] |2j |3j 51 Q. Being still in development, BMI 
systems can potentially provide assistive technology to people with severe neuro- 
logical disorders and spinal cord injuries, as their functioning does not depend on 
intact muscles. For motor control tasks, parameters of intended movements (e.g. 
movement direction or velocity) can be decoded from electrophysiological record- 
ings of individual neurons (6j|3, from local field potentials inside (8]|9l and on 
the surface of the cerebral cortex iTTOl ITT1 [121 PT31 [141 or from electrical fields on 
the scalp |[T5l[T6l[T7l . The decoded parameters can be used for online control of 
external effectors U8l[T9j|20j|3. 

The relation between recorded brain activity and movement is subject to change 
as a result of neuronal adaptation or due to changes in attention, motivation and 
vigilance of the user. Moreover, the neural activity-movement relationship might 
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be affected by changes in the behavioral context or changes in the recording. All 
these nonstationarities can decrease the accuracy of movements decoded from the 
brain-activity. A solution to this problem is employing adaptive decoders, i.e. de- 
coders that learn online from measured neuronal activity during the operation of a 
BMI system and that track the changing tuning parameters ll2Tl[T5Tl . 

Adaptive BMI decoders can be categorized according to which signals are em- 
ployed for adaptation: Supervised adaptive decoders use user's known movement 
intentions in conjunction with corresponding neuronal signals. During autonomous 
daily operation of the BMI systems, however, neither the user's precise movement 
intention nor his movement goal is known to the BMI decoder - otherwise one 
would not need a decoder. Therefore, supervised decoders can only adapt during 
calibration phases, where the BMI system guides the user to perform pre-specified 
movements. Unsupervised adaptive decoders, in contrast, track tuning changes au- 
tomatically without a calibration phase. They can for example benefit from multi- 
modal distributions of neuronal signals to perform probabilistic unsupervised clus- 
tering Il22ll23ll24ll25l . Evidently much less information is available to the adap- 
tation algorithm in the unsupervised case compared to the supervised case. Unsu- 
pervised decoders, hence, might not work for strong nonstationarities and might 
be less accurate and slower during adaptation. The third category, namely error- 
signal based adaptive decoders, do not use an informative supervision signal such 
as instantaneous movement velocity or target position but employ neuronal evalu- 
ation (or error) signals, which the brain generates e.g. if the current movement of 
the external effector is different from the intended movement or if the movement 
goal is not reached Il26ll27ll28l . Unsupervised and error-based adaptive decoders 
are applicable during autonomous BMI control in contrast to supervised adaptive 
decoders. 

1.1. Related work brain machine interfaces. In earlier work, BMI research has 
already addressed online adaptivity issue. For instance, Taylor et al. ||2T1 has 
proposed a BMI system, where individual neuron's directional tuning changes are 
tracked with online adaptive linear filters. Wolpaw and McFarland have shown that 
intended 2-dimensional cursor movements can be estimated from EEG recordings 
|[T5l . In that study, they employed Least Mean Squares (LMS) algorithm to update 
the parameters of a linear filter after each trial. Later, Wolpaw et al. has also 
shown that a similar method can be used to decode 3-dimensional movements from 
EEG recordings ll29l . Vidaurre et al. have proposed adaptive versions of Linear 
Discriminant Analysis (LDA) and Quadratic Discriminant Analysis for cue-based 
discrete choice BMI-tasks ll30ll3Tll25ll . These works employ supervised learning 
algorithms, i.e. they necessitate that the decoder knows the target of the movement 
or the choice in advance and adapts the decoding parameters. In other words, the 
employed methods know and make use of the true label of the recorded neural 
activity. 

More recently, DiGiovanna et al. (32), Sanchez et al. J33] , Gage et al. BH 
have proposed co-adaptive BMIs, where both subjects (rats) and decoders adapt 
themselves in order to perform a defined task. This task is either a discrete choice 
task like pushing a lever Il33l 1321 or a continuous estimation task such as repro- 
ducing the frequency of the cue tone by neural activity Il34l . Gage et al. employ a 
supervised adaptive Kalman filter to update the decoder parameters that match the 
neural activity to cue tone frequency. DiGiovanna et al. and Sanchez et al. utilize 



UNSUPERVISED ADAPTATION OF BRAIN MACHINE INTERFACE DECODERS 3 

a reward signal to train the decoder. The reward signal is an indicator of a suc- 
cessful completion of the discrete choice task. The decoder adaptation follows a 
reinforcement learning algorithm rather than a supervised one. Whether the target 
has been reached, however, in contrast to a fully autonomous BMI task, is known 
to the decoder. 

Error related activity in neural recordings [1351 1361 is very interesting from a 
BMI perspective. In both discrete choice tasks and cursor movement tasks, EEG 
activity has been shown to be modulated, when subjects notice their own errors in 
the given tasks lfl6l l37l . The modulation of the neural activity is correlated with 
the failure of the BMI task, and hence, can be used to modify the decoder model. 
With reliable detection of error related activity, the requirement for the decoder 
to know the target location could be removed. Instead, the error signal could be 
utilized as an inverse reward signal ll38l l39l . An unsupervised, i.e. working in 
complete absence of a supervision or error signal, approach has also been taken for 
an EEG-based BMI binaiy choice task. Blumberg et al. have proposed an adap- 
tive unsupervised LDA method, where distribution parameters for each class are 
updated by the Expectation-Maximization algorithm ll22l . More recently, unsuper- 
vised LDA has also been applied to an EEG based discrete choice task |[23ll24ll25l . 
Unsupervised LDA, however, is limited to finite number of targets. In other words, 
it can not be applied to BMI tasks where possible target locations are arbitrarily 
many and uniformly (or unimodal) distributed. Kalman filtering methods for unsu- 
pervised adaptation after an initial supervised calibration have also been proposed 
for trajectory decoding tasks Il40l |4T1 1421 . These methods adapt by maintaining 
consistency between a model of movement kinematics and a neuronal encoding 
model. 

1.2. Optimal control theory for motor behavior. Motor behavior and associated 
limb trajectories is most commonly and successfully explained by optimality prin- 
ciples that trade off precision, smoothness or speed against energy consumption 
ll43l . This trade off is often expressed as a motor cost function. Within the opti- 
mality based theory motor behavior, open loop and feedback optimization compose 
two distinct classes of motor control models. The former involves the optimization 
of the movement prior to its start ignoring the online sensory feedback, whereas the 
latter incorporates a feedback mechanism and intervenes with the average move- 
ment when intervention is sufficiently cheap. Optimal feedback control (OFC) 
models explain optimal strategies better than open loop models under uncertainty 
ll44l . OFC models also provide a framework, in which high movement goals can 
be discounted based on online sensory input flow Il43l . Optimal feedback con- 
trol usually accommodates a state estimator module, e.g. a Kalman filter, and a 
Linear-Quadratic controller, which expresses the motor command as a linear map- 
ping of the estimated state Il45l . The state estimator uses sensory feedback as well 
as the afferent copy of the motor command. The motor command is a feedback 
rule between the sensoiy motor system and the environment. OFC models obey 
the minimal intervention principle, i.e. they utilize more effort and cost for rela- 
tively unsuccessful movements in order to correct for the errors Il44ll46l . Minimal 
intervention principle is also very important for the current work, as substantial 
deviations can result from both noise and a model mismatch between the organism 
and the environment. The non-minimal intervention, hence, can be interpreted as 
a sign of a possible model mismatch between a BMI user and the decoder. Recent 
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evidence indicates that OFC should also model trial-by-trial and online adaptation 
in order to be plausible empirical evidence on motor adaptation II471I481 . 

1.3. Scope and goals of our research. During autonomous operation of a BMI 
system, the BMI decoder does not know the individual movement intentions of the 
subject nor the goal of the movement, apart from what can be derived from the 
measured brain activity and from sensing the environment. Hence, the decoder has 
no access to an explicit supervision signal for adaptation. We, therefore, developed 
an algorithmic framework for adaptive decoding without supervision in which the 
following adaptive decoding strategies could be implemented: 

(1) Unsupervised, here the adaptation works using exclusively the neuronal 
signals controlling the BMI movements. 

(2) Error signal based, the adaptation uses binary neuronal error signals which 
indicate the time points where the decoded movement deviates from the 
intended movement more than a certain amount. 

(3) Unsupervised + error signal based, the combination of the adaptive mech- 
anisms of (i) and (ii). 

With a BMI system involving those strategies, lifelong changes in brain dynam- 
ics do not have to be tracked by supervised calibration phases, where users would 
go under attentive training. Instead, decoder adaptation would track possible model 
mismatches continually. The BMI users behavior could provide a hint to the de- 
coder even without an explicit supervision signal. It is presumable that inaccurate 
movements result in corrective attempts, which in turn increase control signals and 
control signal variability. Optimal feedback control models, which widely explain 
human motor behavior, support this presumption as they would generate jerky and 
larger control signals under mismatches between the users and the systems tun- 
ing parameters. Here, we develop a cost measure for online unsupervised decoder 
adaptation, which takes the amplitudes and the variations in the user's control sig- 
nals into account (strategy i). Our unsupervised method incorporates a log-linear 
model that relates the decoding parameters to the cost via meta-parameters. Ran- 
domly selected chosen parameters are tested during also randomly chosen explo- 
ration episodes. In the rest of the time, the best decoding parameters according 
to the existing model (initially random) are used. The switch between these ex- 
ploration and exploitation episodes is random and follows an e-greedy policy (see 
section |2) . Harvested rewards for all episodes and associated decoding parameters 
compose the training data, from which meta-parameters are detected using the least 
squares method recursively. Note that we utilize the same algorithm for strategies 
(ii) and (iii). In strategy (ii), we employ the error signal as the cost instead of the 
derived one. In strategy (iii), a combination of both measures serves as the cost. 

2. Methods 

2.1. Simulated task. The user's task is to move a cursor on a 2-dimensional 
screen from one target to the next. Each new target is located randomly on a circle 
of 0.2 m radius around the previous target (figure [Q. If the user reaches the target 
within 4 seconds and stays there for 0.16 seconds, the trial is successful. After an 
unsuccessful trial, the users selects a new random target. Upon success, the trial 
immediately ends and the user selects a new target again. The state of the controlled 
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target 

0.05 m 




Figure 1 . BMI task: The user has to move the computer cursor 
towards the target. The target is 0.2 meters away, at a random 
direction. The target location is decided by the user and unknown 
to the decoder, also for training purposes. The target has to be 
reached in 4 seconds and the cursor has to stay on the target for at 
least 0.16 seconds. Upon both success or failure, the user selects 
a new target. 

system, i.e computer screen and cursor, at a discrete time step, t, is given by 

a* = (pl,Pt.,vl,vf,gl,gf) T . 

where, v\ „ p\ and g\ are horizontal cursor velocity, cursor position and goal posi- 
tion, respectively, v%„ p\ and g\ are the corresponding vertical state variables. The 
screen state evolves according to first order linear discrete time dynamics, 

(1) xt+i = Ax t + B d ut, 

where u t is the C-dimensional control signal and B d is a 6 x C dimensional decoder 
matrix. We assume that the motor command, u t , affects only the cursor velocity 
directly. Therefore, B d s first 2 and last 2 rows are 0: 
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The state transition matrix A models the temporal evolution of the screen state. It 
simply performs the operation (pj +1 ,pf +1 ) = (pl,Pt) + i v h v t)> 
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user model 



control command: 



Controller 



posterior state estimatio 1: 
Xt-d 



Decoder + Environment 
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state: Xf 




Delay + Sensory noise 



sensory feedback: Ut 



user model 



Figure 2. BMI user model and the environment. We model 
the BMI user with a stochastic optimal controller. The state esti- 
mator module is a Kalman filter that corrects the forward module 
estimation with sensory feedback. The controller generates the C- 
dimensional control command u t , which is linearly converted to 
2-dimensional cursor movement by the decoder. 



Note that the goal position remains constant within a trial and it is left untouched 
by the linear dynamics of the screen state. Including the goal position in the state 
vector, however, simplifies the formulation of control signal generation by the user 
model (section l2~2l ). 



2.2. User model: stochastic optimal controller. The BMI user is modeled as a 
stochastic optimal controller, who sends the C-dimensional control command u t at 
discrete time step t (figure [2} . The controller, i.e. the user, assumes that the screen 
state evolves according to a first order discrete time dynamics, 

(2) x t+ i = Ax t + B u ut, 

where B u is the user's estimation of the decoder matrix Bd, 
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The BMI user perceives the state of the cursor with sensory delay and normally 
distributed zero-mean noise, 



y t = Hxt-d + Vt, 
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where d is the sensory delay in time steps and rjt is the noise drawn from M(0, ^l v )- 
The user observes a 4-dimensional vector, y t , which contains the velocity and po- 
sition observations. H is therefore: 

/ 1 \ 
1 
001000 " 
\ 1 / 
In computer simulations, we use a time step of 40 ms. Sensory delay is set to 
200 ms, i.e. 5 time steps. We assume that all dimensions of r]t are independently 
normally distributed with standard deviations of (0.0004 m, 0.0004 m, 0.1 m/s, 
0.1 m/s) T . 

We model the control signals from the BMI-user as the output of a stochastic 
optimal controller. The BMI-user model aims at optimizing the cost function 

(3) J u = ^2(q \\g t - p t \\ 2 + r uju t ) , 

t 

Hfft ~ Pt\\ stands for the euclidean distance between the 2-dimensional cursor po- 
sition and the goal position vectors, q and r are constants that account for the 
relative weights of the two terms in the cost. The same cost expression can be 
written alternatively as, 

J u = ^2 ( x t Qt x t + uj R t ut) , 

t 

where Q t is a 6 x 6matrix that allows for the quadratic expression of the distance 

\ 
1 


' 


1 / 

and Rt = r I. Qt and R t stay constant for all t in our cost model, Qt = Q and 
Rt = R for all t. 

Assume that the stochastic optimal controller minimizes the cost by sending the 
optimal control command at every time step t. In fact, the optimal command is 
disturbed by noise. Here, we model the inherent noise in biological circuits with a 
0-mean normally distributed additive noise vector p t , 

u t = u* + p t . 

This noise consequently presents itself also as additive at state update in equation Q] 

x t +i = Ax t + B d u t 

= Ax t + B d u* t + B d p t 
= Ax t + B d u* t +u} t , 

where uj t ~ Af(0, The problem of computing u\ is known as Linear-Quadratic- 
Gaussian (LQG) control and can be recursively solved by an interconnected Linear- 
Quadratic-Regulator P31I491 , 
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(R + B^St+iBu) l B u St+\A 



S t = Q t + A T S t+1 (A-B u L t ), 
and a state estimating Kalman Filter, 

(4) x t+ i = Ax t + B u u* t + K t (y t -Hx t ) 



(5) 



K t = AX t H T (Hi: t H T + n v y 1 



(6) 



Opw + ASt A T -K t HX t A T . 



Here, S f is the covariance estimate of the state vector variable x t and xt is estimate 
of its mean value posterior to noisy observation y t . fipw is the covariance of the 
noise associated with the forward model prediction. Kalman filter above is a model 
for the state estimator in user's motor control circuitry. B u is the user's estimation 
for Bd- When B u deviates from Bd, the user's control signals are not optimal 
anymore. Above equations assume that the sensory delay equals to 1 time step. 
Larger sensory delays, e.g. d time steps, can be realized by using an augmented 
state vector, xt, which contains d + 1 states together fl44l 



- . i i i , i 

Xt — (x t , x t _ 1 , . . . , x t _ d ) . 

State transition and observation matrices are redefined for the augmented state 
space, 



A 



/AO 
I 
I 



\ 






^ ■■■ I J 



, H = (0, • • • , 0, H) and B u 



( B u \ 




in order to satisfy the system dynamics, xt+i = Ax t + B u ut- Kalman filter 
equations |H [5] and [6] are also modified according to augmented states and system 
parameters: prior state and covariance estimations before the delayed observation, 
i.e. yt-\.\ = H xt+i-d + ?7t+l> are computed using the subject's forward model, 



xj+i = Ax t + B u u* t 



J t+i 



A Ut A + f2 



FW) 
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Posterior state and covariance estimates are similarly computed using the Kalman 
gain matrix Kt+i, 

k t+l = tt +1 H T (HtfH T + n ri )~ 1 

x t +l = + K t +i (yt+i ~Hxf +1 ) 

S m = (I-K t+1 H)±t+v 

Note that in our simulations, Opw is set to a diagonal matrix, whose first 4 diagonal 
entries are the squares of the noise standard deviations, (0.0025 m, 0.0025 m, 
0.625 m/s, 0.625 m/s) T , and the remaining entries are 0. 

2.3. Decoder models. The decoder is modeled by B<i, i.e. it decodes velocity 
information from the neuronal control signal u t . This decoder matrix might de- 
viate from the user's decoder matrix B u , on the basis of which he generates his 
control signals. Therefore, the proposed adaptive decoders adapt their accord- 
ing to B u . In the current section, we describe three decoders: Our recursive least 
squares (RLS) based learning algorithm with unsupervised and error-signal based 
cost functions as well as a supervised RLS filter for performance comparison. 

2.3. 1. Unsupervised learning algorithm. For unsupervised and error-signal based 
decoder adaptation, we define a cost function and estimate B u by optimizing the 
proposed cost function. In the unsupervised setting, the cost is associated with 
control signal, 

(7) 4= £ uju t , 

t=n-T+l 

Here, uj stands for the transpose of the control command vector, t and n are in- 
dices over time steps. T is the number of time steps in the control signal history for 
computing the cost function. Note that the decoder needs to know only the control 
signal, u t , in order to compute the above cost function. This cost function reflects 
the control-related term of the user's cost function (equation [3]). The value of the 
cost function is expected to be high, if the user aims at correcting the movement 
errors which result from a model mismatch between the user and the decoder, i.e. 
between B u and B^. 

We name the cost in equation ^amplitude cost, as it is based on the amplitudes 
of the control commands. We, however, propose a further cost function that can 
be utilized for decoder adaptation, namely deviation cost. Deviation cost uses the 
variances of the control signals across time instead of the summed squared norms 
of the control commands, 

C n 

(8) JL(n)=J2 E K- W " c ) 2 > 

c=l t=n-T+l 

where c is an index over control channels u c t is the control command at channel 
c at time step t. u c ) is the mean value of u\ for channel c across the interval 
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[n — T + 1, n]. A weighted sum of the above costs can also be used as the cost 
function, 

n C n 

(9) Jam P l+dev(n) = £ u I u t+Z^ ^ ~ ^ 

t=n~T+l c=l t=n-T+l 

where Z is a constant for weighting the contributions from each individual cost 
type. 

Alternatively, in case neuronal evaluation signals (i.e. error signals) are available 
in the recordings, we use the number of errors over a finite number of discrete time 
steps as cost, 

n 

(10) 4= Yl err t- 

t=n-T+l 

We simulated the neuronal error signal by assuming that neuronal error signals are 
generated if the deviation between intended and performed velocities exceeds a 
certain amount, 

f 1, for cos(v?,v t ) < cos(20°) 

QTTf = s 

[0, for cos(v*,v t ) > cos(20°), 

where v% is the intended velocity. err t is swapped probabilistically with a proba- 
bility of k in order to reflect the reliability of error signals. Note that similar binary 
movement mismatch events are also recorded in human ECoG Il28l , though 20° in 
our simulation was arbitrarily choosen (see discussion). 

We assume a log-linear model for the decoder cost. Let ft be the parameter 
vector generated by the horizontal concatenation of the third and fourth rows in 
matrix, i.e. ft = [B^, B^]- The model estimates the decoder cost as, 

(11) J d = exp(-[/3 T b]w), 

where b is a constant bias value concatenated to the flattened decoder parameter ft 
and it; is the column vector of the meta-parameters of this log-linear model. We 
denote the — log of the decoder cost by I, 

L = - log(^) = ffi b] W n = ft'Jw n . 

Let [ft T b] = ft' n T . Here, the task is to learn w from explored ft and J d collec- 
tions and to simultaneously optimize ft for a given w. Note that for a given w, the 
cost-minimizing ft would go the infinity, since — log-cost linearly depends on w. 
Therefore, the minimization is performed on the unit circle, i.e. \ft\ = 1. The mo- 
tivation here is to generate trajectories in the right direction rather than to optimize 
the speed of movement. The goal of the unsupervised as well as the error-signal 
based learning algorithm is to minimize the summed squared error, 

n 

(12) £n = X>*-*e2, 

k=l 

where ej. = (if. — 1%) = (Ik — ft'^Wn)- n is the index of the current time step 
and k is an index over past time steps. A is a constant for degrading the relative 
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contribution of the past time steps (0 < A < 1). £ n can be further expressed as, 

n 

= E Xn ~ k ^l ~ 2w nP'kh + wlft k ftjw n ). 
k=l 

Optimum parameters can be found by solving 

n 

V w J n = £ A"" fc (-2^4 + 2ft k ft k T w n ) = 0. 

k=l 



Defining 



£ X n ~ k ft k £ k = @ n and £ A"- fc /3^ T = * n , 

fc=l k=l 

solution to V m „^ n = can be found as 

Vwjn = = -2(G n - * n ™ n ) 

Utilizing matrix inversion dilemma, Recursive Least Squares (RLS) [50] algorithm 
proposes a recursive formulation for if -1 

tf" 1 = P n = \~\Pn-l ~ k n ftJP n -l), 

where 

™n 



X + P'JPn-lftn 

Our method aims at simultaneous harvesting of various decoding parameters Bj 
and, hence, j3 and detecting optimum meta-parameters w. These subtasks corre- 
spond to exploration and exploitation phases of a reinforcement learning algorithm, 
respectively. We employ e-greedy exploration policy. In other words, with a pre- 
defined probability, e, the algorithm prefers exploring the parameter space, which 
means a new /3 is chosen randomly. Otherwise, i.e. with a probability of 1 — e, the 
algorithm uses the best decoding parameters, i.e. the beta that minimizes the esti- 
mated decoder cost (equation [TTb . Given w, the current estimate of w, the optimal 
unit normed j3 is computed by finding argmax|y3| =1 ft T w. This is equivalent to 
maximizing the cosine between ft and w by setting ft = w and normalizing the 
corresponding ft A pseudocode for the algorithm is sketched in table I2.3.T1 

2.3.2. Adaptive supervised recursive least squares filtering. Under the assumption 
that the decoder knows the intended movements of the user, B& can be adapted to 
B u by utilizing an RLS filter. Let v intent be the intended velocity of the user at 
time step t. 

„, intent o/ 

v t - ^u u t, 

B U 3 



B 



where B' u is the submatrix of the third and fourth rows of B u , i.e. B' u - 
The supervised decoder estimates the intended velocity using Bd, 

"intent ryf 

V t ~ B d U t- 

For the supervised decoder, it is assumed that the user's intent, v %ntent , is known 
to the decoder. The supervised RLS learning algorithm infers B u online from 
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RLS based algorithm for continual unsupervised adaptation 
of the decoding parameters 



for time step n at every T time steps do 

# select B 

if random > e 

P n 4- argmax|£| =1 /3 /T tt> n _ r 

else 

P — > random 

endif 

# make prediction on — Zo^-cost 

i n = f3' T W n -T 



#observe the — log-cost of the last T time steps from 
user's ut 

£ n = -log(^) = -log(E?=„-T+i^ T «t) 

#or alternatively according to equation [10] 

#compute the prediction error 

#compute the innovation gain 

, _ P n - T p' n 

\+P' n L P n -T P' n 

#update meta parameters 

w n = w n -T + k n e n [50l 



#update inverse of the correlation matrix 

P n = \- 1 (P n - T -k n f3> n T P n _ T ) 

endfor 

Table 1. A sketch of the unsupervised learning algorithm via RLS 



yintent _ ^intent _ supervised adaptive decoder is used to benchmark the pro- 
posed unsupervised and error-based adaptive decoders. The supervised RLS method 
is described in table !2.3.2l Note that P" stands for the inverse of the C x C sample 
correlation matrix for (uQ...u t ). X SU p is the forgetting parameter of the supervised 
algorithm and is set to 1. Pq is set to 100 /. 
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RLS algorithm for supervised adaptation of the decoding 
parameters 



for every time step t do 



# make prediction on v. 



intent 



ji intent id! „, 

h — D d a t 



#observe user's v ™ tent 



#compute the prediction error 

Antent ..intent ~ intent 

e t — v t ~ v t 



#compute the innovation gain 

* \sup + uj P? Ut 

#update B' d (and hence B^) matrix 

B'^B'+ ej ntent k t T 



#update inverse of the correlation matrix 

pu _ \ -1 ( pu _ u ,.T pu\ 



endfor 

Table 2. A sketch of the supervised RLS algorithm 



2.4. Simulation Procedures. We simulated the interaction of the optimal feed- 
back controller with different adaptive decoders described in section [2] The be- 
havior of the BMI user was simulated using the framework of stochastic optimal 
feedback control which has been shown to provide a good model for human motor 
behavior in various motor tasks ll44ll5Tll48l . The combined system of the optimal 
controller and the adaptive decoder was simulated at 40 ms time steps and we used 
a sensory delay of 200 ms. The user's task was to control a mouse cursor. The 
user selects a target at 0.2 m distance with a random orientation at each trial. The 
user has to reach the target within 4 seconds and stay at the target for at least 0.16 
seconds. Upon both success or failure, the user selects a new target. The distance 
cost parameter q and control signal cost parameter r are both set to 0.02. We set 
Up to 8 x 10 6 /, so that the cursor speed-noise had an average standard deviation 
of 0.0625 m/s over a uniform distribution of unit normed f3 vectors. The variance 
value was manually adjusted to obtain the aimed speed noise by testing on 10 4 unit 
normed random (3 vectors. 

Note that the decoder does not have the information whether a trial is finished 
or continuing, nor does it know the target of the cursor movement. We simulated 
and evaluated the following adaptive decoders: 

Unsupervised:: The decoder learns exclusively from continuous neuronal 
control signals of the user according to equation |7] without any additional 
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Figure 3. The trajectories for random decoders (A) and super- 
vised adaptive decoders (B) during decoder-freeze, i.e. after de- 
coder exploration and adaptation have been switched off for per- 
formance evaluation. Magenta thick curves indicate the failed tra- 
jectories. Each plot depicts the trajectories of 50 training simula- 
tions, each at trial 1501. The 50 different targets and trajectories 
at trial 1501 are rotated to the same orientation (^) for a better 
visual evaluation. 



information. Note that the decoder knows neither whether the target has 
been reached nor when a trial finishes. 

Error signal based:: The adaptation uses binary neuronal error signals which 
indicate the time points where the decoded movement deviates from the 
intended movement more than 20°. The reliability of the neuronal error 
signal was mainly assumed to be 80%, i.e. swapping probability, k, was 
0.2. The effect of various k on the decoding performance, however, was 
also investigated in section [331 

Unsupervised + error signal based:: The combination of the unsupervised 
and the error-signal based decoders, i.e. £ n was a linear combination of the 
unsupervised l n and the error-signal based l n . 

For all of the above algorithms, the current cost is computed from the last 100 
time steps (T = 100). This corresponds to a parameter update period of 4 seconds. 
A of equation [T2l was set to 1, i.e. no gradual discount of the parameter history 
was performed. Exploration rate was 0.4, i.e. e = 0.4. We simulated 50 random 
instantiations of all these unsupervised and the error signal based adaptive decod- 
ing algorithms. 1501 successive trials of target reaching were simulated for each 
instantiation. Note that from trial 1463 on the adaptation of the decoding algo- 
rithms was frozen and the current optimal decoding parameters were used for the 
last 39 trials (decoder-freeze). We evaluated their performances and compared it to 
the performance of a supervised adaptive decoder where the adaptation is based on 
perfect knowledge of the intended movement velocity at each time step (see sec- 
tion l2.3.2l) . Such a supervised adaptive decoder yields the best possible adaptation, 
however, it assumes knowledge that is certainly not available during autonomous 
BMI operation. In addition, we also compared the performance of our adaptive 
decoders to the performance of a static untrained random decoder. 
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unsupervised decoder (ampi cost) 
100 % reach target in 0.72 sec (avg) 



Q error signal (80% reliability) 

100 % reach target in 0.61 sec (avg) 



Q unsupervised + error signal 

100 % reach target in 0.52 sec (avg) 
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Figure 4. The trajectories for different strategies and their vari- 
ations during decoder-freeze. Magenta thick curves indicate the 
failed trajectories. Each plot depicts the trajectories of 50 train- 
ing simulations, each at trial 1501. Note that not only trial 1501 
included 50 simulations, but the whole history of 1501 trials are 
simulated 50 times with random initial tunings. The 50 different 
targets and trajectories at trial 1501 are rotated to the same orien- 
tation (^) for a better visual evaluation. 



Our findings show that all the decoders described in section 12.41 can rapidly 
adapt to accurate cursor control from totally unknown tuning of the neuronal sig- 
nals to movement velocity whereas the random decoder fails to reach the target 
(figure |3K). Although trajectories of the unsupervised and error-based decoders af- 
ter adaptation are more jerky compared to the supervised case, they are still mainly 
straight and yield a high target hit rate of nearly 100% (figure |4|. These results 
show that decoders can be trained during autonomous BMI control in the absence 
of any explicit supervision signal. 

3.1. Comparison of different adaptation algorithms. As a baseline for compar- 
isons, we implemented the supervised decoder that knows the intention of the user 
and fits the decoder parameters, based on this intention (see section 12.3.2b . 
Though unrealistic, this learning scheme is obviously the most successful of the 
presented methods (figure[3}3). In order to compare different algorithms, we utilize 
a measure that counts for the cumulative distance to the movement target and call 
it cumulative error, 



where g t and p t and are the 2-dimensional target and cursor position vectors at time 
step t of trial m, respectively. M m is duration of trial m in time steps. For a more 
intuitive interpretation of the given error measure, we present out results in terms 
of relative cumulative error, which is the normalized cumulative error with respect 



3. Results 
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Figure 5. Evolution of the relative cumulative error (RCE) for 
the unsupervised strategy (A). The plot shows the RCE with re- 
spect to trial number for a single simulation (gray) and the Me- 
dian RCE (MRCE) for 50 simulations with random initial tuning 
parameters (red with marker). Zoom into RCE and MCE curves 
for early (A) and late (B) learning and during decoder-freeze (C). 



to average cumulative error of the supervised decoder after adaptation, 

Cumulative Error 

Relative Cumulative Error 



Mean Supervised Cumulative Error 

Figure |5K depicts the evolution of the relative cumulative error for a single sim- 
ulation (gray) of the unsupervised algorithm and median relative cumulative errors 
(MRCE) for 50 randomly initialized simulations (red, *). The jumps in the gray 
curve correspond to exploration periods, where random decoder matrices, Bd, are 
explored and evaluated. The relative cumulative error shows different characteris- 
tics for early learning (|5}3), late leamingd53l!) and decoder-freeze (03) phases. The 
early learning phase was investigated to compare the learning speeds of different 
algorithms, whereas the late learning phase shows the saturated final performance 
of the algorithms when adaptation continues. During decoder-freeze (last 39 time 
steps), the adaptation (also the exploration) was stopped and the final performance 
of the decoder was evaluated. No jumps in the performance are observed anymore 
due to absence of exploration. 

Figure [6] shows the comparison between supervised, error signal based and un- 
supervised algorithms. MRCEs across 50 runs are shown for the entire simula- 
tion (A), for early learning (B), for late learning (C) and during decoder-freeze 
(D) (figure [6]). Distribution of the relative cumulative errors for the individual 
phases (figure [6] E, F, G) reveal that the supervised algorithm is superior to the 
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Figure 6. Comparison of the unsupervised (black), the error sig- 
nal based (80% reliability, magenta), their combination (green) 
and the supervised (cyan) strategies. Medians of relative cumula- 
tive errors of 50 simulations from each group for all of the trials 
(A). Zoomed medians for early (B) and late (C) learning and dur- 
ing decoder-freeze (D). The distributions of the relative cumula- 
tive errors for each of the phases (E, F, G). The rightmost values of 
the distribution plots denote the total relative counts of the outlier 
values that are greater than the associated a>axis value. The num- 
ber of outlier values decreased across trials, i.e. it was the high- 
est during early learning and zero during decoder-freeze. Outliers 
correspond to the failed trajectories in figures @]and H 



other algorithms in all phases (p < 0.01, Wilcoxon rank sum test). In all of the 
phases, the combination of the error signal based and the unsupervised strategies 
yielded a significantly lower cumulative error than the individual strategies alone 
(p < 0.01, Wilcoxon rank sum test). The performance of the the error-based learn- 
ing was significantly better than the unsupervised strategy also for all of the phases 
(p < 0.01, Wilcoxon rank sum test). Note that the trajectories reached the targets 
not only during decoder-freeze (figure |4) but also mostly in the late learning phase 
(figure [7]), where exploration can occasionally cause some trajectories to deviate 
strongly from a straight line towards the target. 
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Figure 7. The trajectories for different strategies and their varia- 
tions in the late learning phase. Magenta thick curves indicate the 
failed trajectories. Each plot depicts the trajectories of 50 train- 
ing simulations, each at trial 1404. Durations and target hit rates, 
however, are computed from pooled trajectories of 5 consecutive 
trials (1402-1406, totally 250 trajectories). 

3.2. Effect of the parameter update period. We varied the parameter update 
period, T, between 1.2 and 10 seconds in order to check the stability of the unsu- 
pervised strategy with respect to this parameter (figure |T}. Our results show that 
for all tested update periods greater than or equal to 2.4 seconds, the performance 
depended only weakly on exact value of the update period. Though the perfor- 
mance for an update period of 2.4 s was significantly (p < 0.05, Wilcoxon rank 
sum test) inferior compared to an update rate of 4 s or 10 s in both late learning 
and during the freeze of the decoder, the difference was minor. Moreover, the per- 
formance of the update periods 4 s and 10 s were not significantly different during 
decoder-freeze (p > 0.05, Wilcoxon rank sum test). We, therefore, conclude that 
our algorithm is robust against the update rate as long as it is high enough and used 
an update period of 4 seconds for all remaining simulations. 

3.3. Effect of Error Signal Reliability. Error signal based decoder performance 
obviously depends on the reliability of the error signals. Our results so far used 
an error signal with 80% reliability, i.e. k = 0.2. Although several studies have 
shown evidence on neuronal error signals lT35l 136] |26] 1271 . conclusive quantita- 
tive data on the reliability of the error signals is still missing. To compute the 
dependence of the error-based adaptive decoder on k, we varied it between and 
0.8. Our findings show that the reliability must be greater than 50% for successful 
adaptation (figure |9K,B,C,D). Reliabilities of 80% and 100% yielded statistically 
indistinguishable performance during decoder-freeze (p > 0.05, Wilcoxon rank 
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Figure 8 . Comparison of relative cumulative error measures us- 
ing the unsupervised strategy for different update periods (T). The 
algorithm updates the decoding parameters either every 1.2 sec- 
onds (black) or 2.4 seconds (magenta) or 4 seconds (green) or 10 
seconds (cyan). Medians of relative cumulative errors of 50 simu- 
lations from each group for all of the trials (A). Zoomed medians 
for early (B) and late (C) learning and during decoder-freeze (D). 
The distributions of the relative cumulative errors for each of the 
phases (E, F, G). The rightmost values of the distribution plots de- 
note the total relative counts of the outlier values that are greater 
than the associated x-axis value. The number of outlier values de- 
creased across trials, i.e. it was the highest during early learning 
and the lowest during decoder-freeze. 



sum test). Though 80% was slightly yet significantly better than 100% in the late 
learning phase (p = 0.049). A decoder with a reliability of 60% yielded a signifi- 
cantly inferior performance in all of the phases to the decoders with 80% and 100% 
reliability (figure |9j}, F, G, rank sum test, p-value < 0.05), its median error during 
late learning and freezing was only about 30% higher. Decreasing the reliability 
further to 50% drastically increased the median relative cumulative error. 

3.4. Adaptivity to nonstationary tuning. We furthermore investigated, whether 
the unsupervised adaptive algorithm can cope with continual changes in the tun- 
ing. The velocity tuning parameters of the user, i.e. f3 u , flattened third and fourth 
rows of B u , were changed after each trial according to the following random walk 
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Figure 9. Comparison of relative cumulative error measures us- 
ing the error-signal-based training strategy for different reliabil- 
ities of error signals. The error signal reliability is either 100% 
(black) or 80% (magenta) or 60% (green) or 50% (cyan) or 40% 
(red) or 20% (blue). Medians of relative cumulative errors of 50 
simulations from each group for all of the trials (A). Zoomed me- 
dians for early (B) and late (C) learning and during decoder- freeze 
(D). The distributions of the relative cumulative errors for each of 
the phases (E, F, G). The rightmost values of the distribution plots 
denote the total relative counts of the outlier values that are greater 
than the associated a>axis value. The number of outlier values de- 
creased across trials , i.e. it was the highest during early learning 
and the lowest during decoder-freeze. Outliers correspond to the 
failed trajectories in figures |4] and |7] In general, higher the re- 
liability, better the performance. A minimum reliability of 60% 
is needed for successful training. 100% and 80% reliabilities are 
statistically equivalent during decoder-freeze (rank sum test, p > 
0.05). 



model, 

Pu <~ Pu + Q, 

where g is 40-dimensional row vector whose entries are randomly drawn from a 
normal distribution, g ~ jV(0, 0.007). We put a hard limit on the magnitude of the 
entries of p u , so that they did not exceed —0.3 and 0.3. In order to investigate the 
performance of our algorithm under nonstationary tuning, 50 randomly initialized 



UNSUPERVISED ADAPTATION OF BRAIN MACHINE INTERFACE DECODERS 



21 



LU 

CD 
> 

25 
3 

E 

o 

CD 
> 



ZS 

o 
O 

CD 
> 

CD 
DC 



100 
50 



early learning first freeze 

/ late learning \ 

\ 



adaptation stopped 
■adaptation active 

second freeze- 



0. 



learning 

after first freeze 









500 1000 1500 2000 2500 3000 3500 
B C D E 



30 30 

204^f^H 20 Aj^/ 

10 10 



20 40 60 80 1$62~ 2000 3*400 3450 ° 3480 3500 



G 



0.5 



0.5 



Trial 



0.5 



H 



0.5 




5 5 

Relative Cumulative Error 



FIGURE 10. Relative cumulative errors for the unsupervised strat- 
egy under nonstationary tuning. Black curve shows the median 
RCE of 50 simulations, where adaptation was active between trial 
1 and 1961 and stopped after that. Magenta curve shows the me- 
dian RCE of 50 simulations, where adaptation was active both be- 
tween trial 1 and 1961 and after trial 2000. In both groups, adapta- 
tion was inactive between 1962 and 2000 for comparison purposes 
(A). Zoomed medians during the early learning phase (B), the first 
decoder-freeze (C), the late learning phase after the first freeze (D) 
and the second decoder-freeze (E). The distributions of the relative 
cumulative errors for each of the phases (F, G, H, I). The rightmost 
values of the distribution plots denote the total relative counts of 
the outlier values that are greater than the associated x-axis value. 
Outliers correspond to the failed trajectories in figure [TT] 



unsupervised adaptive decoders was compared to another group of 50 randomly 
initialized unsupervised decoders, for which adaptation was stopped after a certain 
amount of trials. Both decoder groups were adaptive for the first 1961 trials, at the 
end of which they reached a stationaiy performance (figure ITOK.B.F). Then, the 
adaptation of both groups was switched off during trials 1962 to 2000 (1st freeze) 
to compare the baseline performance of both decoder groups (figure [TOC.G). As 
expected, both groups performed equally well during the first 2000 trials (p > 0.1, 
Wilcoxon rank sum test). For the first group, the adaptation was then switched on 
again for the next 1462 trials, whereas for the other group the adaptation was kept 
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Figure 11. The trajectories under nonstationary tuning for un- 
supervised strategy and their variations during late learning and 
during decoder-freeze. Magenta thick curves indicate the failed 
trajectories. Each plot depicts the trajectories of 50 training sim- 
ulations. Trajectories during decoder-freeze belong to trial 3501. 
The late learning trajectories were recorded at trial 3402, durations 
and target hit rates are from pooled trajectories of trials 3402-3406. 
The 50 different targets and trajectories at the recorded trial are ro- 
tated to the same orientation (^) for a better visual evaluation. 



off. Evidently, non-adaptive decoders could not cope with the changing tuning any- 
more and the performance strongly degraded (figures IT0K.D.H andfTTK). Adaptive 
decoders, in opposite, tracked the changes in B u well and kept the performance sta- 
ble (figure [lOK.D.H). After trial 3462 a 2nd freeze phase of 39 trials was used to 
quantify the difference in performance between both groups for nonstationary tun- 
ing parameters: adaptive decoders yielded a significantly (p < 0.0001 Wilcoxon 
rank sum test) and about 7 times smaller error than non-adaptive decoders (fig- 
ure [10}\,E,I). In these simulations, we employed the unsupervised decoder cost as 
in equation [7] The simulation settings are the same as described at the beginning 
of section [3] except for A. Here, we set A = 0.995, to reduce the influence of the 
earlier trials relative to the recent ones. This improves performance as recent tri- 
als contain relatively more information on B u . Trajectories of the adaptive group 
reach very accurately to the target during decoder-freeze (figure \TVp) and less but 
also with high accuracy during the late learning phase (figure [TTB). 



3.5. Different decoder costs for unsupervised adaptation. Figure [T2B shows 
the trajectories obtained by 50 simulations of the unsupervised algorithm using 
deviation cost (equation [8) during decoder-freeze. The trajectories were precise 
and fast. 

The trajectories obtained using amplitude + deviation cost (equation© are shown 
in figure [T2tA Again, straight and fast movements were obtained. A comparison 
between the three unsupervised cost functions is presented in figure [T3] All these 
three costs yielded equal performance (p > 0.05 Wilcoxon rank sum test) during 
all phases. 



UNSUPERVISED ADAPTATION OF BRAIN MACHINE INTERFACE DECODERS 



23 




Figure 12. The trajectories for different cost measures and their 
variations during decoder-freeze. Magenta thick curves indicate 
the failed trajectories. Each plot depicts the trajectories of 50 train- 
ing simulations, each at trial 1501. Note that not only trial 1501 
included 50 simulations, but the whole history of 1501 trials are 
simulated 50 times with random initial tunings. The 50 different 
targets and trajectories at trial 1501 are rotated to the same orien- 
tation (^) for a better visual evaluation. 



4. Discussion 

Our results show that under realistic conditions, adaptive BMI decoding start- 
ing with random tuning parameters is feasible without an explicit supervision sig- 
nal. Decoding performance gradually improves across trials and reaches a value 
close to the maximum possible performance as obtained by a supervised adaptive 
decoder, which assumes perfect knowledge of the intended movement of the BMI 
user. Moreover, we propose an adaptive decoder which employs neuronal error sig- 
nals and show that this decoder yields a similar performance to our unsupervised 
adaptive decoder. Unsupervised and error-signal based decoders adapt rapidly and 
generate precise movement trajectories to the target. The suggested decoders do 
not require a supervision signal, e.g. the intended movement, and therefore can be 
used during autonomous BMI control. The suggested unsupervised adaptation is 
based on the minimization of a simple cost function, which penalizes high control 
signals and/or high variability of the neuronal control signals. The rationale behind 
these costs is, that inaccurate decoding causes corrective attempts by the BMI user, 
which in turn increase control signals and control signal variability. Therefore, ac- 
curate movement decoding corresponds to lower costs and the minimization of the 
suggested cost functions improves the accuracy of BMI movement control. Due 
to the generality of this approach we expect this to work in different kinds of mo- 
tor tasks and not only for the reaching task considered in our simulations. Note 
that the cost function could be alternatively derived from only trajectories instead 
of control signals (e.g. deviations from straight line could be punished). An ad- 
ditional argument in favor of our cost functions comes from behavioral studies of 
human motor control: A wide range of human motor behavior can be described by 
optimal feedback control (OFC) models, which minimize cost functions containing 
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Figure 1 3 . Comparison of relative cumulative errors for the un- 
supervised strategy using different cost measures. The algorithm 
employs either amplitude cost (black), or deviation cost (magenta) 
or their combination (green). Medians of relative cumulative er- 
rors of 50 simulations from each group for all of the trials (A). 
Zoomed medians for early (B) and late (C) learning and during 
decoder-freeze (D). The distributions of the relative cumulative 
errors for each of the phases (E, F, G). The rightmost values of 
the distribution plots denote the total relative counts of the outlier 
values that are greater than the associated a>axis value. The num- 
ber of outlier values decreased across trials, i.e. it was the highest 
during early learning and zero during decoder-freeze. 



the same dependence on the motor control signals as we used in our decoder cost 

Balm ED. 

Besides adaptation to unknown but static neuronal tuning to movement, we 
demonstrated that the proposed algorithms can also keep the decoding performance 
stable for nonstationary tuning. This is even possible if the tuning is not only non- 
stationary but also initially unknown. In these simulations, we assumed that the 
nonstationarity of the tuning parameters follows a random walk model and, hence, 
is independent of the decoder. If the decoded movement is fed back to the BMI 
user, the neuronal signals might adapt Il52l and the learning speed as well as the 
final accuracy might even increase beyond the presented values. 
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In order to train the decoder, we assumed a log-linear model that relates the 
decoder parameters to cost via meta-parameters. We introduced a learning algo- 
rithm, which explores the parameter space with a e-greedy policy. Our method 
performs least squares regression recursively to estimate the optimal values of the 
meta-parameters. In other words, the algorithm performs simultaneous exploration 
of the decoding parameters and recursive least squares (RLS) BUI regression on 
the decoder cost function. The same algorithm works also with neuronal error sig- 
nals, where the cost is the number of error signals detected in a given time period. 
Error related neuronal activity has indeed been recorded from the brain via EEG 
ll35l l36l l27l . functional magnetic resonance imaging (fMRI) |[26l and single-unit 
electrophysiology ll53l 1541 . Here, we assume a simple partially reliable error sig- 
nal that indicates a substantial deviation from the movement intention. Neuronal 
activity related to this kind of movement execution errors has been found in ECoG 
ll28l and in fMRI [26]. Milekovic et al. |[28l observed neuronal responses evoked 
by a 180° degree movement mismatch during continuous joystick movement in 1- 
dimension. In our simulations of 2-dimensional movements, we assumed that neu- 
ronal error signals are evoked when the deviation between intended and decoded 
movement exceeds the somewhat arbitrary threshold of 20°. Although it remains 
to be shown in future studies that neuronal error signals are indeed observable al- 
ready at this threshold, we consider this a plausible assumption and expect our 
algorithm to be robust against the exact value of the threshold. Our results show 
that the overall performance of our algorithm is robust against different parameter 
update periods (T) and different error signal reliabilities (> 50%). Arguably, the 
proposed algorithm has the potential to work with various types of neuronal error 
signals, though the computation of the cost function in terms of error signals might 
need adjustments to achieve high performance. 

An alternative to our algorithm would be to use standard reinforcement learn- 
ing algorithms and generalization methods Il55l for directly training the decoding 
parameters without using a meta-parametric model relating cost to decoding pa- 
rameters. In our practical experience, keeping a record of the previously explored 
parameters via P matrix of the RLS algorithm and relating the parameters to the 
log-cost yields good performance. A comparison of our method to different rein- 
forcement algorithms that utilize the same cost and/or other cost functions than the 
ones suggested here, are interesting topics for future studies. Previously, Kalman 
filtering methods were also applied for unsupervised adaptation during trajectory 
decoding BOl HTl l42l l56l . These methods adapt by maintaining consistency be- 
tween a model of movement kinematics and a neuronal encoding model. They 
have been shown to track nonstationarities once an initial model is learned via 
supervised calibration BD1 14T1 1421 . Our unsupervised approach in this work is 
fundamentally different from these methods. We assume that, in the aftermath 
to decoding errors, the statistics of the control signals change and this change is 
utilized for unsupervised adaptation. In the future, it would be worthwhile to com- 
pare the performance of these different methods and their robustness against model 
violations in online BMI taks. 

In principle, our adaptive decoding framework is independent of the type of neu- 
ronal signal that is used to control the movement. As neuronal control commands, 
the instantaneous firing rates of multiple single-unit or multi-unit activities could 
be used. Alternatively, filtered LFP, ECoG, EEG or MEG signals or the power 
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of LFP, ECoG, EEG and MEG signals in different frequency bands could be em- 
ployed. Our algorithms assume that neuronal control signals are linearly related to 
movement velocity. For many different neuronal signal types, indeed, movement 
trajectories can be reconstructed well using this assumption ((6l|57j|2Tl for SUA, 
lfl4l PT3l for ECoG). Linear tuning to movement position or simultaneous linear 
tuning to position and velocity can easily be implemented in our algorithms by 
straightforward modifications of the B matrices (see section Future extension 
of our algorithmic framework might also consider nonlinear tuning. The cost mea- 
sures we introduced, might need some modifications depending on the tuning of 
the recorded signals. For instance, if the control signal, e.g. firing rates for individ- 
ual recording channels, takes an all-or-none behavior, i.e. certain channels are on 
for one direction and off for another direction, the norms of the command vectors 
might hardly vary across different movement directions. In such a case, deviation 
cost might be preferable over amplitude cost. 
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